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Abstract — This paper describes the quantitative application of the 
theory of System Health Management and its operational subset, 
Fault Management, to the selection of Abort Triggers for a 
human-rated launch vehicle, the United States’ National 
Aeronautics and Space Administration’s (NASA) Space Launch 
System (SLS). The results demonstrate the efficacy of the theory 
to assess the effectiveness of candidate failure detection and 
response mechanisms to protect humans from time-critical and 
severe hazards. The quantitative method was successfully used on 
the SLS to aid selection of its suite of Abort Triggers. 
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I. Introduction 

System Health Management (SHM) is a field of endeavor 
that addresses activities that are described under several names, 
including Prognostics and Health Management, Fault 
Protection, Vehicle Health Monitoring and/or Management, 
Fault Detection, Isolation, and Response, Diagnostics, 
Maintainability, Reliability, Availability, aspects of Safety, as 
well as others. It has historically been a relatively ad hoc set of 
processes and technologies that aim to predict, detect, 
diagnose, and respond to failures. In the last four years (though 
the core ideas go back nearly 20 years [1]), a unifying theory of 
SHM has been developed and published in a reference text [2] 
and papers [1][3][4]. This theory provides a conceptual 
framework for the field and for its operational subset, Fault 
Management (FM), based on the idea that FM theory and 
practice is essentially an extension of control theory and 
practice into the off-nominal domain. This paper describes the 
application of that theory on the National Aeronautics and 
Space Administration’s (NASA) Space Launch System (SLS) 
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program, demonstrating that the theoiy is successful in 
providing a firm conceptual and practical basis for estimating 
the value of SHM/FM designs for a system. 

The purpose of SHM is to “preserve the system’s ability to 
function as intended.” SHM provides the capabilities that 
preserve functionality, and can be divided into passive 
capabilities such as design margins and operational capabilities 
such as failure detection, isolation, and response (FDIR). These 
latter operational capabilities are termed Fault Management, 
and are implemented as control loops, known as FM Control 
Loops (FMCLs). The FMCL detects that all or part of a system 
is now failed, or in the future will fail (that is, cannot be 
controlled within acceptable limits to achieve its objectives), 
and takes a control action (a response) to return the system to a 
controllable state [2], 

As a type of control loop, aspects of control theory can be 
applied to understanding FMCLs. Control theory divides 
control loops into two major portions: state estimation and state 
control. Performance of control loops is also divided into two 
pieces, with separate metrics to determine the performance of 
state estimation and state control. For FMCLs, state estimation 
can be measured and assessed using “confusion matrix” 
parameters: false positive, false negative, true positive and true 
negative. State control assessments are based on the speed of 
the FM response compared to the current or impending failure 
effects that it mitigates. Figure 1 shows FM functions arranged 
as a control loop, and how typical FM functions such as 
prognostics, detection, and diagnostics are seen as state 
estimation functions, and decision and response function such 
as determining what actions to take, and failure masking, 
recovery, and avoidance are state control functions. The 
implication of this control-based theory is that the value of FM 
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Fig. 1 . Fault Management Functions as State Estimation and Control. 


designs must be assessed in the same way as system control 
functions in general, are divided into state estimation and 
control portions with their accompanying metrics, and that their 
value is ultimately assessed in terms of the goals that each FM 
Control Loop is trying to protect. 

This paper describes how this theory has been successfully 
applied on NASA SLS Program to quantitatively assess the 
effectiveness of proposed Abort Triggers so as to select the 
most effective suite to protect the astronauts from catastrophic 
failure of the SLS vehicle. An Abort Trigger, in context of 
SLS, is the means by which the SLS detects a crew-threatening 
failure and sends a recommendation to the Multi-Purpose Crew 
Vehicle (MPCV) to initiate an abort response. An abort 
response during ascent enables the MPCV with its astronaut 
crew to escape from a failing SLS and safely return to Earth. 
The success or failure of the abort is ultimately measured by 
the probability that the crew returns safely to Earth in situations 
when failure threatens their safety. The value of an Abort 
Triggers is assessed by its contribution to enabling the MPCV 
and crew to escape the SLS-caused threat and hence 
minimizing the likelihood of Loss of Crew (LOC). The 
effectiveness of Abort Triggers is one important factor in the 
calculation of LOC, and hence to verify the program LOC 
requirements. 

Returning to the theory described above, the value of the 
particular FM mechanisms called Abort Triggers for the SLS 
Program must be measured in terms of their ability to protect 
the crew, which is estimated by determining the change in LOC 
probability that occurs if a particular Abort Trigger or suite of 
Abort Triggers is implemented. The calculation of this change 
in LOC probability, which we call “LOC Benefit,” is 
performed in a manner similar to classical control theory in the 
sense that the problem and the calculation are subdivided into 
state estimation and state control portions. State estimation is 
measured by false positive, false negative, true positive, and 
true negative metrics. State control metrics are measured by the 
probability of making correct response decisions and by the 
effectiveness of the response to the failure (or in more general 
terms, the “disturbance”) that the FMCL is mitigating. In the 
required calculations, the state estimation and state control 


portions are calculated and then combined to generate the LOC 
Benefit numbers associated with proposed Abort Triggers 
implementations. This in turn provides a quantitative basis for 
deciding which Abort Triggers will be provided on the human- 
rated launch vehicle. 

The methods described in the remainder of this paper 
provide one crucial set of information to risk-informed design 
and to the Probabilistic Risk Assessment (PRA) methods that 
support it: the effectiveness of FMCLs to mitigate the effect of 
failures. This paper describes one particular example of this 
general problem: the calculation of the improvement to crew 
safety gained (measured as LOC Benefit) by adding Abort 
Triggers to the design. The value of Abort Trigger must be 
compared to the cost of adding them, because for each Abort 
Trigger added, there is a probability of a false positive, which 
leads to an additional chance of a chance of a Loss of Mission 
(LOM), which in turn generates a small additional probability 
of LOC. We are unaware of any full application of these ideas 
for entire FMCLs for a system, though there have been 
numerous assessments of parts of FMCLs in many systems. 
We believe that the assessments and metrics applied here may 
be the first such application for entire suites of FMCLs for a 
large complex system such as a launch vehicle and crew 
capsule. 

Because the actual data used for this calculation in the SLS 
Program cannot be disclosed for general publication, after the 
methods used for doing the calculations are described in detail, 
a sample calculation using fictitious numbers is provided to 
demonstrate how the quantitative methodology is applied in 
practice. On the SLS Program, the actual calculations were 
performed and the resulting data used as part of the decision 
processes for deciding on the Abort Trigger suite to be used on 
the launch vehicle. 

Even though this paper describes the application of FM 
theory to the assessment and selection of a launch vehicle 
Abort Trigger suite, the authors are convinced that the theory 
and the methodology described here, appropriately tailored to 
the application, applies to any system in which FM is applied to 
predict, detect, and respond to failures. This paper describes the 
kinds of quantitative metrics by which FM is assessed for state 
estimation and state control, and demonstrates typical issues 
involved in applying those metrics to FM design and 
operations. It therefore provides insights that can be applied to 
any complex system in which prospective or current failures 
must be mitigated. 

II. Fault Management Metrics and Application to 
Human-Rated Launch Vehicle Abort Triggers 

A. Abort Conditions and Triggers 

NASA’s SLS, managed and integrated by Marshall Space 
Flight Center (MSFC), is intended to fly both humans and 
cargo, though unlike the Space Shuttle, only on separate SLS 
missions. MSFC’s Integrated System Health Management and 
Automation Branch is developing the Mission and Fault 
Management (M&FM) function for the SLS. M&FM is 
responsible for leading the functional integration and 
implementation of the SLS mission and fault management 
decision logic, including redundancy management, safing. 


caution & warning, and abort recommendation algorithms. 
M&FM is also responsible for the quantitative analysis which 
integrates program-wide supporting data and provides the 
rationale for the selection of Abort Triggers for the SLS 
vehicle, which is the focus of this paper. 

For its crewed, human-rated configuration, an Abort 
Trigger is a specific type of failure detection that detects the 
existence of an “abort condition,” which is a state or behavior 
whose existence implies a current or impending threat to crew 
safety. Most crew threatening failures ultimately result in one 
of three major situations: explosions of the launch vehicle, loss 
of control of the launch vehicle, or inability to achieve orbit 
even though the vehicle is otherwise able to fly normally. The 
need to escape from an exploding launch vehicle is obvious. 
Loss of control usually leads to an explosion as well, but the 
immediate threat is that the crew will be unable to safely abort 
off the launch vehicle that is oriented in the wrong direction. 
Finally, an abort is ultimately needed if the MPCV will be 
unable to achieve orbit. Flowever, in these relatively benign 
scenarios, the MPCV generally has anywhere from a few 
seconds to a few minutes to abort from the stable launch 
vehicle. 

While some Abort Triggers directly detect these three 
ultimate situations, some Abort Triggers detect precursors to 
these situations. Ideally, the latter is preferred due to warning 
time provided, however, except for limited failure scenarios, 
detecting all precursors with certainty is not currently 
technologically feasible. Triggers that pick up loss of control 
include decisions based on the vehicle attitude error that has 
exceeded its controllability threshold. Other Abort Triggers 
detect loss of communication with key components, which then 
cannot be controlled, which will eventually cause loss of 
vehicle control. These same triggers can also be indirect 
indicators of a structural collapse or explosion that has 
destroyed the components or communication to those 
components. Other Abort Triggers detect conditions relating to 
high or low solid rocket booster pressures or liquid propellant 
rocket engine temperatures. These indicate impending rupture 
of the solid rocket boosters or combustion chamber explosions. 

When an Abort Trigger detects an abort condition, there are 
two possible situations. The first is that an abort is needed 
immediately. In this situation, it sends a message, called an 
“abort recommendation,” to the MPCV. The second situation is 
when an abort is not required immediately, but will be required 
eventually. This happens in cases of slow-developing failures, 
or in cases in which the SLS will not achieve the desired orbit. 
In these cases, the SLS sends a warning message, not an abort 
recommendation. 

In either case, only the MPCV or the crew inside the 
MPCV can initiate an abort action. The Flight Director in the 
Mission Control Center can also command the crew to initiate 
an abort based on telemetry data received from the SLS and 
MPCV. For situations that require an immediate abort, as 
designated by the SLS abort recommendation message, a set of 
MPCV computer algorithms known as the abort decision logic, 
will immediately and automatically initiate an abort. In cases in 
which the abort response is not required immediately, the crew 
has time to select the best time to abort. 


The process used to identify abort conditions and potential 
Abort Triggers includes a variety of methods. Since SLS is 
composed of both new and existing hardware and software, 
some abort conditions and potential triggers are readily 
identified since they were defined or used in this fashion on 
prior programs. For example, the RS-25 Core Stage Engine 
(CSE) used on SLS is the same as was used on the Space 
Shuttle, but with upgraded controller electronics. These Space 
Shuttle Main Engines have the capability to detect impending 
engine failures and shut themselves down through its failure 
detection and response capabilities, and these demonstrated 
capabilities are maintained for the SLS. Similarly the Space 
Shuttle and Constellation Ares I programs both used Solid 
Rocket Boosters and provided sensors and other hardware that 
can be used to detect booster pressures. This is an obvious 
candidate for use on the SLS as well. 

Other abort conditions and potential Abort Triggers must be 
identified by assessment of new or evolving SLS designs. This 
is particularly true for the SLS Core Stage, which houses most 
of the avionics that controls the entire launch vehicle, with the 
exception of the Interim Cryogenic Propulsion Stage, which 
acts as the upper stage for the first two SLS flights. For 
example, if the Core Stage Flight Computers have internal 
software failures, or fail too many strings of its redundant 
hardware, then they will be unable to control the vehicle and 
hence the MPCV must abort. Other potential triggers include 
detection of loss of control from the Guidance, Navigation, and 
Control System, of thrust vector control system gimbal angles, 
liquid oxygen tank and liquid hydrogen tank pressures, etc. 
These potential triggers are identified by considering the 
consequences of failure of the various boxes, with respect to 
ultimate crew safety on the MPCV. 

One tool used to identify candidate Abort Triggers and to 
determine their coverage against mission and crew safety goals 
is the Goal-Function Tree (GFT). This representation provides 
a hierarchical representation of system goals and functions 
rigorously modeled using state variables. It enables a top-down 
assessment of the coverage of FM mechanisms to detect 
failures that can compromise system goals. For every goal that 
must be achieved, there is the possibility that it is not achieved. 
If it is critical to take action if the goal is not achieved, then the 
system designer can place failure detection at that point. In 
turn, this detection can activate a failure response. By 
reviewing the impact of failure along all paths up the GFT, the 
FM engineer can design a suite of failure detections that ensure 
that all paths up the GFT are covered, and can provide 
preliminary information regarding how much warning time 
they provide for a response to be activated compared to the 
failure effects they are attempting to mitigate. For SLS, the 
GFT was used to assess the coverage and physical relationships 
of Abort Triggers. These relationships include understanding of 
whether two or more triggers existed along any given GFT 
path, which means that more than one trigger can detect failure 
effects for a given Loss of Mission scenario [5], 

Once identified and described, these candidate Abort 
Triggers must be evaluated to determine if they provide 
sufficient benefit to crew safety to warrant inclusion in the SLS 
design. The evaluation process is based on the understanding of 
the launch vehicle risks through past failures, various design 



and safety analyses, such as Guidance, Navigation, and Control 
(GN&C) controllability. Failure Mode and Effects Analysis 
(FMEA) and Flazard Analysis, Probabilistic Risk Assessment 
(PRA) and/or engineering judgment, and trades between 
technical capability, schedule, cost, benefit and risks. Because 
it is not technically and financially feasible to monitor all 
possible failure modes and to implement all Abort Triggers, a 
Risk-Informed Decision Making (RIDM) process, which is part 
of NASA’s System Engineering Process, is implemented to 
emphasize the proper use of risk analysis in its broadest sense 
to make risk-informed decisions [6]. This is not new for 
NASA, or for engineering more generally. Flowever, the 
quantitative analysis described in this paper to perform risk- 
informed decision-making involves several more engineering 
organizations in a comprehensive, in-depth quantitative 
analysis of Fault Management than has been performed to date. 

B. Abort Trigger Relationships to Redundancy > Management, 

Safing, and Caution & Warning 

In abort scenarios, frequently several other FM actions also 
occur. These include safing actions, Redundancy Management 
actions, and Caution & Warning notifications. 

For any launch vehicle, history shows that the highest 
probability failures are those related to its propulsion system. 
For SLS, this includes the liquid oxygen and liquid hydrogen 
tanks, the plumbing to move the propellants to the liquid 
propellant engines, and the turbopumps, propellant injection, 
and combustion. Flowever, liquid propellant engines have a 
major safety advantage compared to solid rocket motors insofar 
as they can be shut down. As previously stated, the SLS CSE, 
have the capability to detect failures and respond by shutting 
them down, i.e. engine redlines protections. 

The CSE shutdown response is considered a “safing” 
response. Safing is defined as an action to change system 
configuration, state, or goals to protect humans or assets. A 
CSE shutdown clearly fits the definition, as it changes the 
system configuration by removing failed CSEs from use. In 
doing so, it prevents an uncontained engine failure that protects 
the rest of the launch vehicle and the MPCV. A shutdown, if 
successful, potentially prevents a catastrophic explosion. 

For the SLS, if two or more CSEs shut down, the launch 
vehicle will either be unable to perform attitude control, or the 
launch vehicle will be unable to boost the MPCV to the desired 
orbit. In either event, an abort will be necessary. If only one 
CSE is shut down, then except for scenarios in which the 
shutdown occurs near the start of ascent, the launch vehicle can 
generally maintain control and the MPCV can reach orbit. 
From an analysis viewpoint, single engine shutdowns early in 
the mission, and multiple engine shutdowns are abort 
conditions. Single engine shutdowns, if they occur in the 
middle or end of the ascent, are not abort conditions because 
the MPCV remains safe and the MPCV can achieve nominal 
orbit. When a single engine shutdown is early in the mission 
and orbit cannot be achieved, the SLS sends a warning message 
to the MPCV, which signifies that an abort will be needed, but 
not immediately. If two or more engines shut down, then abort 
will be needed immediately and an abort recommendation is 
sent. For the purposes of analysis, the probability of successful 


CSE detection and shutdown is an important factor in the 
overall assessment of SLS Abort Triggers. 

CSE shutdowns are not the only potential safing actions for 
the SLS. During ascent, there are failure modes that result in 
the inability for one of the boosters to separate. If this occurs, it 
may be desirable to prevent both boosters from separating, as 
the launch vehicle is far more stable with two boosters 
remaining on the vehicle than only one. Prior to launch, there 
are a variety of safing actions that can occur, to prevent 
hazardous events from occurring. These will not be discussed 
in this paper, as these do not result in aborts. 

Redundancy Management (RM) also plays a significant 
role in the analysis of Abort Triggers. This is because 
successful RM actions enable the mission to continue to 
successful completion. For example, if one of the two 
Combined Control System Electronics (CCSE) boxes fails, the 
RM action will remove the failed CCSE from use, enabling the 
remaining CCSE to continue to operate. If the remaining CCSE 
fails, then the SLS will send an abort recommendation, because 
the ascent can no longer be continued safely. 

Analytically, the successful RM response to a failure of one 
of two CCSEs is accounted for through a probabilistic estimate 
of the reliability of the redundant CCSE. If both CCSEs fail, 
this results in a LOM and consequent abort. If only one of the 
two fails, then the mission continues and no LOM results. 
Abort Triggers come into play only if a LOM occurs, including 
False Positives of the Abort Triggers in a mission that 
otherwise would have succeeded and never when the mission 
continues successfully. 

Finally, Caution & Warning (C&W) notifications are 
another aspect of the design worthy of mention. As described 
above, abort conditions that do not require an immediate abort 
are implemented as warning messages. Caution messages are 
sent from the SLS to the MPCV when a failure occurs that 
degrades SLS safety margins. Examples of this include the 
failure of a single Core Stage (CS) Flight Computer (FC) of the 
suite of three FCs, or a propellant tank pressure that is higher or 
lower than expected, but not yet reaching safety margins that 
necessitate an abort. Warning messages and resultant non- 
immediate aborts must be accounted for in the analysis of 
Abort Triggers. Caution messages are only indirectly assessed 
insofar as they are related to the RM actions described above, 
which affect reliability of redundant component suites, and 
hence the calculation of LOM probabilities. 

C. Abort Trigger Quantitative Metrics: LOC Benefit 

As described in the Introduction, FM is implemented as a 
suite of control loops that monitor state variables, determine if 
the states of these state variables indicate current or future 
failure, determine the location of the failure cause (isolation), 
decide on appropriate responses, and then execute these 
responses. Quantitative metrics that enable assessment of FM 
performance relate to these functions. While in general we can 
divide FM metrics into state estimation and state control 
metrics, the specifics of the metrics must be tailored to the 
application. 

Since FM exists to preserve system functionality that 
achieve system goals, the designer must determine the goals 



and functions of the system that any given FM design is 
intended to protect. A given FM control loop might protect all 
of the top-level system goals, or it might protect some subset of 
the top-level goals or lower level goals. For example, for a real- 
time triplex voting computer system, the triplex voting (which 
detects a computer that is providing incorrect data and removes 
that data and/or the failed computer from use) preserves the 
system’s computing functions, which usually exist to support 
higher-level system goals. 

Since Abort Triggers exist to protect the crew, for a human- 
rated launch vehicle such as SLS, the highest-level metric or 
measure of value of an Abort Trigger is the probability that the 
Abort Trigger and the resulting abort response enable the crew 
to escape the hazard and return to Earth. If no Abort Trigger 
exists to detect an abort condition, then the MPCV and crew 
will not escape the hazard or return to Earth. The Abort 
Triggers could be on the SLS, on the MPCV, with launch or 
mission operations on the ground, or could even be the crew 
itself. If an abort condition exists, an abort response must be 
taken, and this cannot occur unless the abort condition is 
detected, which is the purpose of an Abort Trigger. The value 
of the Abort Trigger in the analysis methodology described in 
this paper is measured as a Loss of Crew (LOC) probability per 
mission. To be precise, the measure of an Abort Trigger value 
is called “Loss of Crew Benefit", or LOC Benefit, which is the 
per-mission probability that a given Abort Trigger saves the 
crew. In the NASA SLS Probabilistic Risk Assessment (PRA), 
LOC Benefit is sometimes called “Abortable LOM," the per- 
mission probability that a LOM can be successfully ‘aborted.’ 

For LOC Benefit to be estimated, the Abort Trigger must 
detect the failure, which estimates the state (i.e. does an abort 
condition exist), and the abort response must succeed in 
removing the MPCV and crew from the hazard. Metrics are 
required for both, which follows the general rule that FM 
metrics are necessary for state estimation and state control. 
State estimation determines the current or future existence of 
the failure, and isolates (determines) the location of the failure 
to the necessary level of granularity. State control consists of 
deciding what response to take, and the effectiveness of the 
response. Between them, the state estimation and state control 
metrics must combine to calculate the LOC Benefit number. 

State estimation metrics are based on the quad of Tine 
Positive, True Negative, False Positive, and False Negative. 
These are defined here. 

• False Negative: An incorrect decision that a 
condition does not exist, when it actually does exist. 

• False Positive: An incorrect decision that a condition 
exists, when it actually does not exist. 

• True Negative: A correct decision that a condition 
does not exist, when it actually does not exist. 

• True Positive: A correct decision that a condition 
exists, when it actually does exist. 

For detection of abort conditions, the condition to which 
these definitions refer is not merely a failure, but a failure 
whose current or later effects will threaten the crew and require 
an abort response. Each of these metrics is specified as a 


probability. For False Positives, it is specified as a false 
positive probability per mission. For False Negatives, it is 
specified as a false negative probability per failure occurrence. 
True Positives and Negatives work similarly: a True Positive is 
specified per failure occurrence, and True Negative is specified 
per mission. 

The next part of state estimation is fault diagnosis, which 
includes both fault isolation and fault identification. The former 
refers to determining the location of the cause of the detected 
failure effects, with a specified level of granularity. The latter 
refers to the specific failure mode or cause of the detected 
failure effects. For the on-board and immediate purpose of 
enabling the crew to escape from safety-critical hazards, 
identifying the cause of the crew-threatening hazard is 
unimportant, though it will matter for the post-flight failure 
investigation. Determining the location of the failure cause is 
also unimportant, as it only matters that the failure effects are 
somewhere on the launch vehicle. Once again, for post-flight 
analysis, it will be important to determining the location of the 
failure causes. Though fault diagnosis is a critical part of FM in 
general, for the purposes of Abort Trigger analysis it is not a 
significant issue and thus will not be discussed any further in 
this paper. 

The next metric of potential relevance is the correctness of 
the selection of which response to take. In the case of abort 
conditions and triggers, during launch vehicle ascent there are 
only two possible responses: to abort immediately, or to abort 
eventually. From an implementation standpoint, the SLS 
distinguishes these two possibilities. For situations that require 
an immediate abort, the SLS sends an abort recommendation. 
For situations in which the abort does not need to be taken 
immediately, the SLS sends a warning message to the MPCV. 
Both of these cases are pre-determined before flight, and if the 
selection decision between these two responses is incorrect, it 
is incorrect in the design and analysis of the Abort Triggers. 

The final metric of importance is the effectiveness of the 
abort response. In these cases, there are several factors 
involved. The first is the speed of failure effect propagations, 
which include failure effect propagations internal to the 
vehicle, and the propagations external to the vehicle, such as 
explosion overpressure, fireball, and debris. The second is the 
amount of warning time that the Abort Triggers provide. If 
more warning time is provided, then the MPCV has more time 
to escape the hazardous environment. The third is the criticality 
of these effects. 

For explosion dynamics that consist of the three major 
factors of overpressure, fireball, and debris, the criticality of the 
impact to the crew can vary significantly over the course of 
ascent. During ascent, ambient air pressure, velocity through 
the atmosphere, and dynamic pressure vary greatly, as does the 
amount of propellant remaining in the launch vehicle. One of 
the most significant factors is the amount of debris generated in 
an explosion. In general, larger explosions resulting in more 
debris occur lower in the atmosphere. In turn, more debris 
means a larger probability of this debris striking the MPCV and 
causing LOC. Conversely, as the launch vehicle reaches very 
high altitude, much less debris is generated, and the MPCV can 



frequently survive the blast even if it has not escaped the 
hazardous environment, simply because no debris hits it. 

All of these metrics are combined to generate the LOC 
Benefit for an Abort Trigger. The LOC Benefit is the measure 
of the value of the entire FM Control Loop in which the Abort 
Trigger resides. However, for every Abort Trigger added to the 
system, there is the possibility of a False Positive, in which the 
Abort Trigger erroneously determines that an abort condition 
exists and sends an abort recommendation leading to an abort. 
This produces an added probability of LOM, and is an inherent 
cost of an Abort Trigger. In addition, since every abort 
response can fail causing LOC, there is also a small added 
probability of LOC for every abort caused by a False Positive 
abort recommendation. This too must be accounted for in the 
overall calculation of the LOC Benefit, in which the False 
Positive LOC probability must be subtracted from the LOC 
Benefit of successful crew survival based on the existence of 
the Abort Trigger. 

D. The Value of the LOC Benefit Calculation 

The calculation of LOC Benefit, compared to the LOM and 
LOC costs is a means of performing quantitative analysis of the 
value of Abort Triggers, or more generally, of the value of 
Fault Management and of System Health Management for a 
given system. The theory of SHM clearly indicates that as an 
extension of classical and robust control theory, the 
performance of FM Control Loops can be assessed in ways 
similar to classical control loops, and using similar metrics of 
state estimation and control. However, because the purpose of 
SHM and of FM is to mitigate potential, impending, and actual 
failure, the benefits of SHM/FM must account for the 
probability of failure. If a system were perfectly reliable at 
acceptable cost with a single-string design, no redundancy and 
no FM would be necessary. However, in practice few if any 
large-scale, complex systems are sufficiently reliable in this 
way, which makes FM necessary. Thus assessing the value and 
performance of FM requires estimation of the probability that 
failures will impact the system’s goals [2]. 

Estimating the probability of failure requires methods of 
reliability theory, PRA, and more generally, of risk-informed 
design. In addition to classical methods of estimating reliability 
of components, other sources of faults, such as common cause 
failure, human reliability or human error, and software faults 
are all needed to estimate the probability of failure of hardware, 
software, and humans. Since FM is implemented through 
FMCL that mitigate the effects of failure, the value of these 
FMCLs inherently depends on the probability of the failures 
that they mitigate. Abort Triggers necessarily exist to mitigate 
risks to the safety of the crew, which is only a subset of the 
failures that can occur in a human-rated launch vehicle. The 
different kinds of hazards to the crew (whether on the ground 
or on-board the MPCV) that can occur, and the probability with 
which these occur, must therefore be estimated. 

The LOC Benefit value for an Abort Trigger in a launch 
vehicle, and also similar “benefit’’ calculations that can be 
performed for other systems for reliability, availability, or 
safety, is only useful in a comparative sense. For human-rated 
launch vehicles, Abort Triggers are useful only in situations in 
which the required orbit or mission success cannot be achieved 


and an abort will be required now or in the future. For the sake 
of argument, assume that the probability of achieving orbit is 
90%, which equates to a LOM probability of 10%. If no abort 
action occurs, these LOM cases will result in Loss of Crew. 
Further, assume that the LOC requirement is set at 1% per 
mission. This means that the Abort Triggers and abort 
responses must reduce the LOC from 10% down to 1% or 
below. The difference between these values is the required 
amount of LOC Benefit that must be provided. Abort Triggers 
are worthwhile only if they provide “significant” value in 
driving LOC down to the required level. 

To these dependability-centered metrics one can also 
estimate other costs, such as the actual monetary and schedule 
costs. These non-dependability costs are not addressed in this 
paper, but of course are important for this and all other 
applications. However, these can be estimated in standard ways 
that do not require further elaboration here. 

III. SLS Abort Trigger Analysis Methodology 

This section describes the analysis process used to assess 
the value of Abort Triggers on SLS for the program’s 
Preliminary Design Review (PDR) and one post-PDR cycle of 
analysis. 

A. Abort Trigger Analysis Overview 

The SLS Abort Trigger Analysis can be generalized into the 
following seven major steps. 

• Step One. Identify Abort Triggers to be assessed and 
insert into the Abort Analysis Matrix (AAM) 
spreadsheet. 

• Step Two. Obtain the list of LOM scenarios, which is 
a combination of the mission phase, failure scenario, 
and LOM Environment (LOME), modeled by the 
PRA group. 

• Step Three. For each LOM scenario, examine the 
associated PRA minimum cut sets to determine which 
Abort Triggers can detect the failure effects modeled 
in this LOM scenario as primary or secondary 
detections. 

• Step Four. For each LOM scenario and trigger 
identified in Step Three, estimate the percent coverage 
of the LOM risk for each Abort Trigger based on 
associated minimum cut sets, and the corresponding 
minimum, mode and maximum Abortability Table 
Warning Time (ATWT) estimates. 

• Step Five. Based on the LOM scenario’s contributing 
failure scenario and LOME, identify which 
abortability table or tables to be used for the abort 
effectiveness (AE) value lookup, or specify manual 
AE inputs. 

• Step Six. Execute AE lookup Front End Microsoft 
Excel Visual Basic for Application (VBA) Macro to 
populate AE into the AAM. 

• Step Seven. Format output results for PRA use. 



Figure 2 shows a generalized SLS Abort Trigger Analysis 
steps with brief commentaries on key steps and products. 
Please note that the numbered notes shown in Fig. 2 are not the 
same as the aforementioned Abort Trigger Analysis major 
steps. Additional descriptions of the key items, such as LOM 
Scenarios, Abortability Table, and Abort Effectiveness are 
described and discussed in the remainder of Section III. 


B. Loss of Mission Scenario Identification 

For Abort Trigger analysis, the only failures that are 
relevant are those that threaten the MPCV and crew. As 
described above, whether these are immediate threats due to a 
vehicle that is breaking up or losing control, or whether the 
vehicle is stable but cannot boost the MPCV into the required 
orbit, the mission is lost. Therefore the failure scenarios that 
require aborts are always Loss of Mission scenarios. 
Determining the effectiveness and value of Abort Triggers 
requires estimation of the effectiveness of these triggers in all 
LOM scenarios. This in turn implies that all LOM scenarios 
must be identified. 

In general, failure scenarios describe unique failure 

behaviors, with a unique set of failure responses, with a 

specific system configuration over a specified time period. In 
the SLS Abort Trigger analysis, LOM scenarios specify a 

unique set of failure behaviors with a specific system 

configuration over a specified time period. Flowever, in these 
LOM scenarios, more than one Abort Trigger could be 
activated first. Since only one Abort Trigger can send an abort 
recommendation for an abort response to occur, it only matters 
which Abort Trigger detects the abort condition first and issues 
the abort recommendation (or for non-immediate aborts, the 
warning message). If the first Abort Trigger that could 
potentially detect the abort condition fails to do so, it is usually 
true that another Abort Trigger will detect a later, 
“downstream” failure effect, which is also necessarily an abort 
condition. The analysis methodology maps all Abort Triggers 
that can potentially detect the failure behaviors in a LOM 
scenario, so LOM scenarios themselves don’t need to define all 
possible Abort Triggers that can be activated. It only needs to 
define the behaviors themselves, and then the Abort Triggers 
are mapped into the LOM scenario. In an actual operational 
event, only one of these Abort Triggers will be the one that 
activates first, and this defines an “abort scenario.” Thus within 
a LOM scenario there are several possible abort scenarios, 
depending on which Abort Trigger detects the abort condition 
first. 

As described in the previous major section, the Goal- 
Function Tree provides a method to identify potential Abort 
Triggers and understand their relationships to each other. The 
reason that the GFT aids this understanding is that every path 
in the GFT represents not only a set of needed goals and 
functions, but also the failure behaviors that will occur in a 
specific failure scenario when a goal cannot be achieved. 
Therefore the GFT can be used as a starting point to define 
failure scenarios for a given system [5]. Those failure scenarios 
that lead to direct threats to the crew or to inability to achieve 
orbit are the ones that must be identified. 


.&0 


o 

o 

o 

o 

© 

© 

© 

© 


Feedback from Previous 
Abort Analysis 

uO 

LOM Scenarios 
Probabilities and Cut Sets 
(From PRA Group) 

I 

LOM Scenarios Review by 
M&FM and Domain 
Experts 


0 


1 


SLS STE 
Abortability 
Tables 


0 


SLS GN&C Abort Triggers I © 

Monte Carlo Simulations 


I© 


Determine For Each LOM Scenario: 

• Applicable Abort Triggers (Primary 
or Secondary) and Percent Coverage 

• Best Match Abortability Table 

• ATWT for each Abort Trigger 


Execute 

AE 

Lookup 

Excel 

VBA 

Script 


A 


Populate Abort Analysis Matrix 


M&FM Abort Analysis Results 


1 


O * 


o 


M&FM Abort 
Triggers Selection 


Format AE Results 
for PRA 


Note 

New, deletion or changes to existing LOM Scenarios. 

New or changes to existing Abortability Tables such as: ATWT 
range, FTS delay time, and changes in failure scenarios. 

GN&C AE estimates for vehicle Loss of Control Scenarios. 

Percent vehicle thrust vector control failures that can be 
successfully controlled and lead to mission success. SLS 
GN&C provides inputs to PRA prior to LOM Scenarios 
delivery to M&FM. 

Structures and Environments (STE) Abortability Tables are 
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Fig. 2. Simplified SLS M&FM Abort Analysis Process. 




For SLS, LOM scenarios were defined in an iterative 
process that started on the NASA Constellation program and 
Ares I project going back to 2005. This process involved 
several groups, including Mission and Fault Management 
(M&FM), Safety and Mission Assurance (S&MA) PRA, 
GN&C, and Structures and Environments (STE). It took 
several years to work through several iterations of 
understanding about how to define failure scenarios, what level 
of detail is needed, and what the criteria were. When the Ares I 
Project and Constellation Program were cancelled in 2010, the 
transition to what became the SLS program enabled another 
reassessment of the criteria and modeling of failure scenarios 
needed to perform Abort Trigger analysis, leading to the 
criteria defined here. In sum, this was a multi-year, multi- 
organization iterative process, leading ultimately to the proper 
set of criteria that enabled the quantitative analysis described 
here. 

We now know that we can use the GFT representation and 
analyses, as well as the PRA and Hazard cause tree models to 
help define the needed LOM scenarios, and with the experience 
of having done the Abort Trigger analyses, the proper level is 
now understood. Preferably, the LOM scenarios must be 
defined to the level necessary to map the Abort Triggers into 
the unique set of failure effects over relevant ascent time 
periods. These time periods include not just unique vehicle 
configurations, but also differing external environments at 
different altitudes, velocities, and pressures. 

C. Loss of Mission Scenario Probability Estimation 

Once a set of LOM scenarios are identified and agreed to 
by the SLS abort analysis team, the S&MA PRA group 
quantifies the model based upon NASA PRA Procedures Guide 
for NASA Managers and Practitioners [7, 8] and NASA Cross- 
Program PRA and SLS PRA Plans, and best industry practices. 
A PRA model is a logic model that represents a failure scenario 
or failure outcome for a system. For SLS, fault trees are being 
developed by the SLS S&MA PRA group to assess SLS LOM 
scenarios. The PRA fault tree is an integrated risk model 
representing the SLS LOM probability during flight by 
modeling system and component failure modes and 
dependencies from a LOM failure scenario’s point of view. 
Once the fault tree models are completed, the models are 
populated with failure data in order to quantify the risks. 

PRA uses various fault tree basic events quantification 
techniques depending on the failure mode, design maturity, and 
availability of data. Without going into specifics, typical SLS 
PRA data are grouped into the following general cases: 
functional failure, common cause failure, phenomenological 
failure, external causes, and process or manufacturing errors. 
Detailed descriptions for the various NASA PRA data classes 
can be found in NASA Procedures Guide for NASA Managers 
and Practitioners [7]. Although SLS is still in the design phase, 
the majority of the hardware used is Shuttle heritage or derived, 
or commercial off the shelf, where significant amount of 
reliability data exists. In situations where flight and test data is 
not available, reliability prediction values from similarity 
analysis, similar components, handbook data, and/or domain 
expert judgments are used to initially estimate the risks. The 


results from the SLS PRA model are in the form of fault tree 
minimum cut sets. 

A cut set can be interpreted as a “failure scenario” that 
consists of a single failure or combination of failures that are 
assumed to result in a LOM. A cut set is said to be minimum if 
it cannot be Boolean-reduced further. Automatically generated 
by the PRA software, each minimum cut set includes a 
description of the event or events involved with the 
corresponding probabilities for a specific mission time. By 
default, the probability estimates represent the mean failure 
probability for each minimum cut set. 

The LOM minimum cut sets and their associated 
probabilities are used in the AAM to understand the specific 
failure causations for a given LOM scenario, which is 
composed of the combination of mission phase, failure 
scenario and LOME. If the basic events’ probability estimates 
contain uncertainty, then a Monte Carlo simulation can be 
used to estimate uncertainty around the mean likelihood 
estimates. The uncertainty analysis provides another crucial 
piece of information to allow for risk-informed decision 
making by understanding the probability intervals or 
“probabilistic estimate variability” for each failure scenario or 
effectiveness of specific design changes, such as hardware 
redundancy, FM protocols and Abort Triggers [3]. Further, 
because uncertainty analysis is required for NASA Cross- 
Program PRA (the PRA organization and model including 
SLS, MPCV, Ground Systems, and Mission Operations) [5], 
the SLS AE estimates generated by SLS M&FM group 
contain uncertainties, which will be described later. Currently, 
the AE uncertainty is only applied to the mean estimates of the 
LOM scenarios provided by the PRA group. For SLS Critical 
Design Review, integration of the AE and LOM scenario 
uncertainties will be implemented to better understand and 
communicate the uncertainty for the AE and LOC benefit 
estimates. 

D. Failure Detection and Confirmation - False Positives and 

False Negatives 

Abort Triggers are failure detection mechanisms, and as 
described in section II.C above, they are assessed using False 
Positive (FP) / False Negative (FN) / True Positive (TP) / True 
Negative (TN) metrics. A desirable Abort Trigger has low FP 
and low FN rates, and conversely high TP and high TN rates. 
FP and FN are logical complements to TP and TN, and on SLS 
the calculations are performed for FP/FN. 

For SLS, Abort Triggers are generally designed to use 
redundant measurements so as to minimize the possibility that 
a failure in the measurements from one string of hardware will 
not lead to an abort based on a sensor or measurement failure. 
That is, redundant measurements are used to reduce the false 
positive rate. The way this is phrased on SLS is that there is a 
“detection” and a “confirmation”. This means that an Abort 
Trigger requires two measurements that indicate an abort 
condition exists. 

Calculation of FP/FN for a given Abort Trigger requires the 
application of reliability theory for redundant suites of 
hardware components and for common mode failures, and the 



physics-based assessment of threshold values for triggers that 
monitor continuous state variables. Failure of components that 
are involved with a given Abort Trigger, which include 
sensors, data buses, computers, and software algorithms are 
assessed in the usual way according to classical reliability 
theory, with two significant additions. First, the calculation 
needs to estimate the probability that component failures lead 
to an FP or FN, and incorporate them into the modified 
traditional reliability equations. Second, which is an issue of 
particular relevance for Abort Triggers, is the need to model 
and assess the effectiveness of ‘‘Sensor Data Qualification” 
(SDQ). SDQ determines whether the information being used by 
the decision algorithm is valid, and hence it inherently reduces 
the possibility of false positives and false negatives. Because 
SDQ exists to reduce FPs and FNs, the effectiveness of the 
SDQ must itself be modeled. In other words, it is insufficient to 
build a typical, simple reliability block diagram. The models 
used to estimate the effectiveness of Abort Triggers must 
include detailed models of SDQ, to determine if the SDQ 
routines are providing sufficient value to be worthwhile to 
develop. Finally, the FP/FN calculation must also assess the 
common cause failure rates, as it typically happens that when 
the reliability of redundant strings of components is assessed, 
the dominant factor will be the common cause failure rates of 
hardware and software (or of humans, if part of the trigger 
design). 

The other major factor in the calculation of the FP/FN of 
Abort Triggers is the assessment and determination of where to 
set the threshold values that distinguish the difference between 
nominal and failed behavior. For measurements of binary state 
variables, such as whether a bit is set to a 1 or 0, threshold 
values do not matter. However, for continuous variables such 
as pressure, temperature, position, or attitude (angular 
direction), it is essential to specify a threshold that 
differentiates between nominal and failed behavior. In general, 
there is no single threshold value that inherently distinguishes 
nominal from failed behavior. Rather, there is often a range of 
values that can occur when the system is in a nominal or failed 
state. Within this range, setting the threshold to guarantee 
detection of the failure (having a near 0% false negative rate) 
will be susceptible to a high false positive rate, in which the 
detection indicates that a failure exists, when in fact there is no 
failure. Conversely, trying to minimize the false positive rate 
will generally increase the false negative rate. 

In cases where the overlap between nominal and failure 
behaviors overlaps a great deal, that is, when small changes to 
FP create large changes to FN or vice versa, then it may be 
advisable to not use that state variable for failure detection 
purposes, and conversely to monitor some other state variable 
instead. In this case, the other state variable to be monitored 
may have larger separation between nominal and failed 
behaviors. Or it may be that the original state variable can be 
monitored, but additional state variables must also be 
monitored to provide more information to distinguish nominal 
from failed behaviors. 

In any event, determining where to set the threshold is a 
system-specific decision. Where failure cannot be tolerated, but 
false positive rates are acceptable, then the threshold can be set 
to provide a near guarantee that the failure will be detected, but 


at the cost of potentially higher false positive rates. Or the 
system may be biased to minimize false positives, in which 
case there will be a greater chance that a failure will occur that 
will not be detected. 

It should also be noted that changing the threshold values 
does not determine simply whether failure is detected or not, 
but rather is often a case of when the failure is detected. It is 
often true that a failure causes an ever-increasing or decreasing 
value of the state variable, which will diverge further from 
nominal behavior over time. Thus biasing the threshold to 
reduce false positive rates often has the effect of delaying when 
the detection occurs, as opposed to reducing the probability of 
detection as a whole. If there is sufficient time for the system to 
recover from the failure, or in the case of the SLS, for the crew 
to escape the threat, then it may be possible to set the threshold 
to reduce false positives, yet still provide a reasonable amount 
of time for the crew to escape. 

For the case of a human-rated launch vehicle, it is not 
inherently obvious which bias should be used. On one hand, 
setting the threshold to protect human life is crucial to provide 
astronauts a means to escape a current or impending threat to 
their safety. On the other hand, for a heavy-lift, deep space- 
capable vehicle such as SLS that will cost perhaps up to a 
billion dollars per flight, aborting a mission that could have 
succeeded is a tremendously expensive decision that could 
even jeopardize the program. To date, astronauts have accepted 
the high risks of space flight, and so the philosophy for setting 
Abort Trigger thresholds is not biased towards crew safety so 
as to create a large risk of false positives and unnecessary 
aborts. The discussion about the appropriate balance remains 
an ongoing debate, but it is clear that having a high false 
positive rate is not acceptable, just as having high risks to the 
crew is also unacceptable. The relative impact of changing 
threshold values on crew escape times is a large factor in the 
threshold design. 

E. Crew Survivability - Explosion Dynamics 

When a launch vehicle’s energetic system, such as solid 
rocket motors or liquid propellant engines experience an 
uncontained failure, the failure effects can propagate to the 
surrounding system and lead to potential detonation of the core 
stage. This explosion is the primary source of hazards to crew 
survivability during an abort by generating blast overpressure 
wave, fireball, and debris/fragmentation field toward the 
escaping crew module [9, 10]. 

The severity of these hazards on the crew module depends 
on several variables that must be accounted for in the abort 
analysis. These variables are: nature and severity of the launch 
vehicle failure; failure propagation from element failure to 
vehicle explosion; vehicle and launch abort vehicle trajectories; 
design of the crew capsule and launch escape system (such as 
structural strength); propagation of the explosion dynamics to 
the crew capsule; abort warning time provided by the launch 
vehicle Abort Trigger; and attitude of the crew capsule when 
the hazards reach it [9, 10]. To ensure consistency and 
traceability in the SLS and MPCV abort analyses, the design 
capability of the crew module to withstand aforementioned 
crew threatening explosion hazards are described in a set of 
MPCV abort environments table limits that are also being used 



as design requirements for SLS to meet either by design or 
analysis. 

SLS STE group is responsible for modeling interactions of 
SLS element explosion dynamics, starting from characterizing 
the potential impact of the initial failure manifestation or 
LOME, and their abort environments against MPCV launch 
abort vehicle and its vulnerabilities. This entails the analysis of 
the likelihood that a failure producing a given LOME will 
propagate to other elements and ultimately to vehicle 
explosion, and the impact on the MPCV structure. The list of 
LOMEs is generated through discussions with SLS PRA and 
M&FM personnel with the objective of providing complete 
coverage of possible failure outcomes at a level of refinement 
sufficient to enable the identification of leading crew risks and 
effective mitigation strategies. The LOMEs include Core Stage 
(CS) engine explosion, booster burst, pad explosion, CS 
external explosion, and CS intertank internal explosion. 

STE develops abortability tables as functions of mission 
phase and available warning time by integrating the effects of 
failure propagation with characterizations of the environment 
severity. The environment severity is developed by integrating 
characterizations of the failure environment initiation and 
propagation with MPCV abort trajectories and vulnerabilities. 
Results are time -averaged across mission phases, with results 
for each phase provided as functions of warning time in rows 
that are identified by the Mission Event Time at the midpoint 
of the associated mission phase. For SLS PDR, twelve 
abortability tables were generated. Each abortability table is 
contained within a separate Microsoft Excel workbook, and 
each workbook contains three worksheets labeled Best, Worst, 
and Base representing the three types of modeling and failure 
propagation assumptions. Best case assumptions typically show 
higher levels of abortability resulting from the use of more 
optimistic assumptions regarding propagation and the 
application of more benign explosion environments. The 
lowest levels of abortability, typically shown in the Worst 
sheet, result from pessimistic assumptions regarding both 
propagation and environment severity. The abortability values 
in the Base sheet result from most likely estimates of 
propagation and environments, with some remaining 
conservative bias. 

For use with SLS PRA, each AE “estimate” is composed of 
three values: worst, base and best assumptions, and for 
explosion cases these are extracted from the STE abortability 
tables based on the ATWT estimated by M&FM for each LOM 
scenario based on a worst-on-worst, base-on-base and best-on- 
best type of bounding analysis. That is, the worst AE is based 
on the shortest abort warning time with the worst case 
abortability table. Best AE corresponds to longest abort 
warning time with best case abortability table, and Base AE 
corresponds to median abort warning time. 

F. Crew Survivability - Loss of Control 

Another typical kind of threat to the crew is when the SLS 
vehicle is unable to maintain control. In general this means that 
its attitudes (the direction that it is pointed) and attitude rates 
(the rate of change of pointing direction) diverge from nominal 
such that the vehicle will not achieve orbit, or threaten SLS or 
MPCV structural load limits, hence causing a “structural 


demise.” Assessing the SLS vehicle’s ability to control the 
vehicle’s attitudes and rates, and the guidance to achieve orbit 
is the job of the SLS GN&C group. This group performs 
typical launch vehicle GN&C analyses using physics-based 
Monte Carlo simulations and stability analyses. For the SLS, it 
also performs analyses of the GN&C subsystem’s ability to 
detect failures to control through detecting diverging attitude 
errors and rate errors. 

For SLS Abort Trigger analysis, GN&C uses the same 
kinds of physics-based, Monte Carlo methods as it uses for 
typical “nominal” guidance and control analyses. These 
methods are supplemented by modeling hypothetical Abort 
Triggers that monitor attitude and rate errors and other GN&C 
state variables as appropriate, with corresponding hypothetical 
thresholds. As described in Section D above, the threshold 
values are set based on assumed requirements for False 
Positive and False Negative rates, and analyzed against a 
variety of failures whose effects manifest themselves in GN&C 
state variables. Since these thresholds are set based on the 
needed FP and FN rates, and against SLS and MPCV structural 
limits and orbital performance metrics, the GN&C Abort 
Triggers are generally effective in detecting failures that 
manifest themselves through attitude and rate errors before a 
structural demise and explosion occurs. Abort responses are 
usually able to enable the MPCV to escape the resulting 
explosion, at least when the Launch Abort System is available. 

Additionally, GN&C Abort Triggers provide a secondary 
benefit of detecting some situations in which the vehicle 
structure has collapsed or when an explosion has occurred. 
While not designed specifically to address these cases, if the 
SLS CS Flight Computers remain active and are not yet 
destroyed by an explosion, the GN&C Abort Triggers can 
detect that an explosion has occurred, causing the vehicle to 
lose control. In these cases the GN&C Abort Triggers do not 
provide much warning time, because the explosion is already 
occurring before the crew will be able to activate an abort 
response. 

G. Crew Survivability - Benign Scenarios 

When a failure occurs that does not result in an immediate 
launch vehicle explosion or loss control, and the launch vehicle 
continues a stable, controllable flight, the LOM scenario is said 
to be benign. One example of a benign LOM scenario is an 
abort resulting from a false positive detection from an Abort 
Trigger. In this case, the MPCV aborts off a perfectly fine 
launch vehicle that is flying in a nominal trajectoiy. Another 
benign LOM scenario is the shutdown of a single CS engine , 
or “single engine out” that requires an abort at some point later 
in the flight. For SLS PDR, it was assumed that for benign 
LOM scenarios, the AE value, as confirmed by SLS STE 
group’s abort dynamics analysis, is assumed to be 100% off the 
SLS vehicle. That is, there are no threats from SLS that will 
impede crew capsule abort. 

Even if the SLS poses no threat in benign scenarios, this 
does not mean that the abort has no risks at all. There are 
always risks due to MPCV/crew capsule failures such as the 
Launch Abort System (LAS) not firing or a failure to deploy 
parachutes. These risks are captured in the Cross-Program PRA 
where successful crew capsule abort off SLS are linked to 



appropriate MPCV abort models that include both failure of the 
MPCV abort system and physical abort environment. 

H. Abort Analysis Matrix 

The Abort Analysis Matrix (AAM) is a Microsoft Excel 
spreadsheet that implements the M&FM group’s model of SLS 
Abort Trigger effectiveness. It uses the LOM scenarios 
provided by PRA group and their associated probabilities, 
delineated by vehicle mission phase. In the AAM’s M&FM 
Input sheet/tab, each row represents a LOM scenario, that is, a 
specific combination of the FS, LOME and mission phase. 
AAM columns represent candidate Abort Triggers to be 
assessed. This format allows for all candidate Abort Triggers to 
be assessed for their value in each LOM scenario. 

Each Abort Trigger entry in the AAM consists of five sub 
headers. The first sub header consists of four columns that 
define the performance characteristics of the trigger when 
activated in a given LOM scenario. They are: 

• Trigger Detection Class: Indicates if the trigger will be 
the first one to detect the failure behaviors in the LOM 
scenario (i.e. Primary) or if the trigger will detect the 
failure behaviors only after a primary trigger fails to 
detect the scenario failure behaviors (i.e. Secondary) 

• False Negative: A percentage of the time that the 
trigger will not detect the failure behavior in the LOM 
scenario. 

o For Primary triggers, any FN probability will 
be potentially detected by a Secondary trigger, 
or if there is no Secondary trigger, it is assumed 
that the remaining probability will lead to LOC. 

o For Secondary triggers, it is assumed that any 
FN probability will lead to LOC. Any possible 
Tertiary triggers are ignored as having 
extremely low probability. 

• Trigger Probabilistic Split: A percentage that indicates 
the portion of the LOM scenario probability that can be 
picked up or monitored by this specific trigger relative 
to other Abort Triggers in its trigger detection class. 
That is, the trigger probability splits for a primary 
trigger are associated only with other primary triggers, 
and secondary triggers only with other secondary 
triggers. As an example, if two triggers A and B are 
primaiy, and C is secondary, then a 60/40 split of A to 
B means that 60% of the given LOM scenario 
probability will be detected first by A, and 40% by B. 
Only false negative probabilities of A and B are 
potentially detected by C, which captures 100% of the 
FN probabilities of A and B. 

• ATWT Uncertainty Distribution Type: Uncertainty 
distribution associated with estimated ATWT inputted. 
Currently, only uniform and triangular distributions are 
used. This is used to determine the mean of ATWT 
from the inputted Min, Mode and Max values, and for 
future Monte Carlo simulation. 

The remaining four sub headers delineate probabilistic 
distribution attributes associated with the trigger. Each sub 


header, titled Minimum (Worst Case), Mode (Most Probable 
Case), Maximum (Best Case), and Mean (Average Case), have 
two columns, ATWT and AE. 

With the format of the matrix defined, with the LOM 
scenario data as rows in the matrix and the Abort Triggers and 
their associated parameters as columns, the process of filling in 
the matrix can begin. The first step is to analyze each LOM 
scenario to determine which of the triggers will detect it first. If 
a trigger can directly detect the results of a LOM scenario then 
it is set as a Primary trigger by placing an “X” in the field that 
matches up with that trigger’s Trigger Detection Class field 
and the LOM scenario that is being analyzed. If multiple 
triggers can act as Primaries, then it is necessary to allocate a 
certain percentage of the specific LOM scenario probability to 
each of them. This allocation is based on an M&FM 
assessment of the failure causes modeled in the specific LOM 
scenario to determine what fraction of the resulting effects 
could be picked up by the trigger. In the current structure of the 
AAM, it is not possible for a primary trigger to act as 
secondary. It was discovered later in the analysis that there are 
some cases where a trigger could potentially be a primary or a 
secondary. This issue is dealt with through approximations 
now, but may be fixed in future versions of the AAM. The sum 
of the primary and secondary trigger split fractions for each 
LOM scenario will be 100%. 

The AAM calculation sheet is set up in similar manner as 
the M&FM Input sheet. It consists of a matrix that has the 
LOM scenarios and associated data as rows and the Abort 
Triggers and associated data as columns. The calculation sheet 
reads in data from the M&FM Input sheet. Each trigger has a 
set of calculations that it performs for each LOME scenario; 
they are as follows: 

• True Positive LOM: Portion of LOM scenario 
probability that is successfully detected by a given 
trigger. 

• Primary Trigger FN Probability: Portion of LOM 
scenario probability that is not detected by primary 
triggers due to FN. 

• False Negative LOC: Portion of LOM scenario 
probability that is not detected by either primary or 
secondary triggers, resulting in LOC. 

• AE LOC Residual: Probability associated with failure 
to abort successfully even if a trigger properly detects 
the failure behavior in the LOM scenario. 

• LOC Benefit: Probability that a given Abort Trigger 
saves the crew from immediate launch vehicle failure 
given that it detected the failure behavior 

From the definitions above, True Positive LOM (TP LOM) 
probability of primaiy trigger ( LOM TPx ) is related to the 
primary triggers’ probabilistic splits ( Splito/ 0x ) of the 
remaining LOM scenario probability that are not subject to 
primary trigger FN (FN % ), see (1). 

LOM TPx = Splito /ox x LOM Tx x (1 - FNo/J (1) 



Accounting for AE associated with the specific LOM scenario 
and primary trigger, the LOC Benefit of the primary trigger is 
simply the product of the AE and TP LOM primary trigger 
calculated from (1), see (2), 

LOC Bx = AEo /ox X LOM TPx (2) 

Because secondary triggers only operate on FN of primary 
triggers, the calculation of TP LOM of secondary triggers 
( LOM TPy ) is similar from that of ( LOM TPx ) but with total 
primary trigger FN probability and against the remaining 
LOM scenario probability that are not subject to secondary 
trigger FN (FN %y ), see (3). 

LOM TPy = Split %y X (l - FN 0/oy j x FN L0Cx _ Total (3) 
where 


FN X . = LOM rPxi 


FNo/ 

x l . 


FN XTotal = Xl lF N Xi 
FN XTotal = 5£i LOM TPxi 


FNo/ 

x - 


(4) 


The LOC Benefit of secondary trigger (LOM TPy ) is calculated 
similarly to that of the LOC Benefit of a primary trigger; it is 
based on the AE associated with the specific LOM scenario 
and secondary trigger, and TP LOM of the secondary trigger, 
see (5). 


LOC By — AEo /oy X LOM rPy (5) 

With all of the calculations completed, the sum of the TP 
LOM and LOC Benefit for each of the triggers are taken and 
compared in a separate worksheet. Because a distribution of 
ATWT in turn determines the uncertainty distribution of AE, 
the overall trigger LOC Benefit can be assessed as a 
distribution or a “range of values”. This allows for initial 
estimates of the uncertainty spread of the variability without 
performing detailed Monte Carlo simulations to gauge the 
worst-on-worst vs. best-on-best vs. “Most Probable” vs. 
average benefit that each of the triggers provides within the 
analyzed trigger suite. 

The process described above is used for any particular 
trigger sets of interest. If one desires to assess the benefit of 
adding or removing additional trigger or triggers, one would 
simply modify the trigger set’s columns, and repeat the 
process. This is necessary because of the intricacies of 
assigning primary and secondary triggers and trigger splits to 
each of the Trigger vs. LOM scenario locations on the 
spreadsheet. The addition or removal of triggers often alters the 
trigger split fractions. By repeating the analysis with a single 
additional candidate trigger a comparison can be made between 
sets of triggers. This gives a Delta LOC Benefit between one 
set of triggers and another set of triggers. 


I. Abortabilty Tables Lookup Script 

To automate the AE lookup from the abortability tables, a 
set of Microsoft Excel VBA macros were written. There have 
been several enhancements made to the AE lookup macros 
since its first revision to increase its speed, capability and 
simplicity. Housed within a separate Microsoft Excel file 
called the Front End, the Front End file contains key 
information regarding the file names of the AAM and STE 
abortability tables, and user friendly features. The use of 
custom combinations of abortability tables, booster FTS delay 
time and LOM scenarios subjected to booster free flyers are 
embedded in the specific revision of the AAM to allow a 
unique AAM file to be adapted for specific analysis case. 

The Microsoft Excel VBA macro performs the following 
actions upon execution. 

1. Open AAM and associated abortability tables 
Microsoft Excel files 

2. Step through each LOM scenario row in the AAM, 
checking for missing ATWTs for applicable Abort 
Triggers 

3. Step through each LOM scenario row and read in 
identified mission phase and abortability table type 
used 

4. For each LOM scenario row, step through all Abort 
Triggers, and perform lookup of the AE from the 
abortability table, based on ATWT, mission phase and 
abortability table name, fill the relevant AAM cell 
with the AE result 

5. Save the AAM and duplicate the results, and repeat 
Step 3 and 4 to account for booster free flyer risk if the 
user chooses. 

6. Prompt the user that the macro has ended, and close 
appropriate data files. 


For inputs into the PRA software, the resulting AE for each 
LOM scenario is reviewed and formatted as a triangular 
distribution. A triangular distribution was used based on a 
recommendation by STE based on the understanding of the 
physics of failure, and modeling and propagation assumptions 
used. The mode of the triangular distribution is set to the base 
AE, and the lower and upper bound are set to the worst and 
best AEs. 


IV. Application of Methodology 

This section describes how the analysis methods described 
in the previous section are applied to specific SLS Abort 
Trigger-related problems. 

A. LOC Benefit of Abort Trigger Suite 

The most important analysis that is performed using the 
methods described in section III is the selection of the Abort 
Trigger suite for the SLS vehicle. As described previously, the 
primary metric of value of an Abort Trigger is the benefit it 
provides for the crew to escape safety-threatening, SLS-caused 
hazards. If the crew successfully escapes, a potential Loss of 
Crew situation becomes merely a Loss of Mission, and the 
amount of LOC that is avoided by the existence of an Abort 



Trigger is the value of that Abort Trigger, its LOC Benefit 
value. 

The SLS vehicle will use several Abort Triggers, which are 
collectively known as the Abort Trigger suite. To analyze the 
value of the suite as a collective, the suite of triggers are added 
to the Abort Analysis Matrix, and evaluated as described in the 
previous section. This analysis assigns the entire amount of 
LOC Benefit to an Abort Trigger in a failure scenario if it is the 
trigger that activates first. When performed in this way, the 
analysis of the entire suite of triggers provides an accurate 
assessment of the value of the entire Abort Trigger suite as a 
whole. However, the individual LOC Benefit values for each 
trigger might be considered somewhat misleading, as will be 
described in the next section. Nonetheless, this is the simplest 
method of LOC Benefit evaluation, and it definitely provides a 
clear-cut method to estimate the value of each Abort Trigger, 
with the clearest and simplest interpretation. 

B. Delta LOC Benefit of Added Abort Trigger 

While the LOC Benefit calculation method described above 
in Section IV.A provides the simplest calculation and clearest 
interpretation of results, it is in some ways misleading. This is 
because the benefit of an Abort Trigger should ideally be 
measured as the change in LOC Benefit that is provided by 
adding this trigger to the previously existing set of triggers. 
This is the “Delta LOC Benefit" method. The major difficulty 
with this method is that the value of any single trigger depends 
on the value of all the previously selected triggers, and this in 
turn means that the order in which triggers are selected 
generates a different Delta LOC Benefit value for any added 
trigger! 

Let us assume for the sake of argument that the SLS vehicle 
is designed initially without any Abort Triggers, which means 
that any detection of failures would occur by detection of 
hazards from the MPCV or Mission Operations, or when on the 
ground, from ground systems. These detections and resulting 
abort responses will provide some finite amount of LOC 
Benefit, which can be calculated in the manner described in 
Section III. To determine which Abort Triggers on the SLS will 
provide maximum improvement to LOC, one could ideally 
assess all possible Abort Triggers one by one to determine the 
added LOC Benefit they provide compared to the off-vehicle 
set of triggers. Based on this comparison, one can then select 
the trigger with the maximum LOC Benefit, as long as its 
monetary and schedule costs are low, and as long as its 
projected False Positive rate is low enough. 

Once added, this new trigger is added to the existing suite 
of off-board triggers, yielding a trigger suite consisting of the 
off-board set plus one on-board trigger. This new suite can be 
assessed as a group to determine its collective LOC Benefit. 
The analyst can then once again assess all remaining potential 
triggers and calculate their Delta LOC Benefits. Again, one can 
select the next-best trigger, add it to the suite, assess as a group, 
and repeat the process. This process continues until the Delta 
LOC Benefits of the remaining potential triggers are all so low 
as to be not worthwhile to add to the suite. Because every 
Abort Trigger has a cost in additional Losses of Mission due to 
False Positives, once the Delta LOC Benefit of potential 
triggers begins to approach the probability value of the Delta 


LOM Cost due to the false positives of adding that trigger, it is 
no longer worthwhile to add the new trigger. For launch 
vehicle aborts, every Loss of Mission also produces a small 
additional Loss of Crew, because there are risks associated with 
every abort. Thus the added costs are the added LOM and LOC 
values due to the False Positive rate of the added trigger. 

While the sort of analysis described below can in theory be 
performed, for the SLS vehicle this was not done, due to the 
relative complexity of the method, and also due to schedule and 
resource constraints to perform the work and yet meet program 
schedules and deliveries. However, with the right kind of tool 
(which does not yet exist), this Delta LOC analysis process 
could be automated and done cost-effectively in the future. 

C. Free-Flying Booster Risk 

Booster free-flying (BFF) risk addresses the risk to the 
escaping crew module due to a breakaway, or rogue booster. 
Currently, there is a communication signal between the booster 
and the core stage to address public safety concern of a 
runaway solid booster. This communication signal activates the 
booster’s Flight Termination System (FTS) when the booster 
becomes detached. The breakaway booster scenario can occur 
either due to failure of the booster to core stage attachment 
points during nominal flight, during a vehicle failure, such as 
vehicle explosion or loss of control, or after initiation of an 
abort. Depending on when the booster breaks away from the 
core stage, the average LAS acceleration and head start time, 
and FTS delay time, there is a chance that the booster can catch 
up or overtake the escaping MPCV and detonate. The FTS 
detonation releases booster fragments on the crew module if 
the booster is next to the MPCV at the time of detonation. If the 
booster is in front of the MPCV at the time of detonation, then 
the MPCV will fly into the resulting debris field. If the booster 
is behind the aborting MPCV at the time of detonation, then the 
distance and relative velocity between the two must be taken 
into account in the calculation of the likelihood of debris strikes 
on MPCV. 

The time at which the booster breaks away is important, as 
the acceleration, velocity, and resulting position of the booster 
relative to the MPCV depends on the atmospheric density and 
the booster propellant mass fraction. For a heavier booster in 
low altitude, the booster has more mass to carry, reducing its 
acceleration and therefore reducing its ability to catch up to the 
MPCV, as compared to a booster breakaway higher up in the 
atmosphere, with less friction and less mass, and thus higher 
booster acceleration. For the initial assessment, a simple one- 
dimensional trajectory of the MPCV and rogue booster was 
used, and it is assumed that booster breaks away when the 
launch vehicle explodes. The time at which the launch vehicle 
explodes is also a primary factor in the calculation of the 
ATWTs and abortability tables, and thus using the assumption 
that the booster breaks away at the same time as a launch 
vehicle explosion occurs made the determination of the LAS 
head start time very easy. 

For the actual AE assessment, the aforementioned 
Microsoft Excel VBA macro is used and BFF risk is assessed 
in Step 5 of the aforementioned Abortabilty Tables Lookup 
Script steps. This risk is added to the risks described above for 
benign, loss of control, and explosion failures. If the MPCV 



does not survive the initial SLS failure, then LOC already 
exists and any additional BFF risk does not matter. If the 
MPCV survives the SLS failure, then the BFF risk is applied. 
The BFF risk is also applied to all applicable LOM scenarios, 
such as premature booster separation or vehicle explosion. To 
account for the variability in the FTS activation delay, which 
for a generic FTS can be up to +/- 1 sec, a bounding assessment 
similar to the worst ATWT on worst abortability assumptions 
is used. This results in worst-on-worst-on-worst or best-on- 
best-on-best bounding cases. That is, the worst (shortest) FTS 
delay time AE is operated on the worst ATWT and worst 
abortability table. The final AE for a particular LOM scenario i 
and trigger j is shown in (6): 


LOM TPx2 = 20%xlxl0‘ 5 x(l-5%)=1.90xl0' 6 (9) 
LOC Bx2 = 40%xl.9xl0' 6 = 7.60X10' 6 (10) 

Now that the TP LOM and LOC Benefit are calculated for the 
Primary Triggers, the total Primary Trigger FN Probability 
must be calculated for use in the Secondary Trigger 
calculations using (5): 

, I 5% \ _ 

FAL =7.60xl0' 6 x — — =4.00xl0' 7 
1 \l-5%/ 

, / 5% \ 

FAL = 1.90x10' 6 x — — =1.00xl0' 7 
1 Vl-5%/ 


■^^LOM Scenario) B pp 

AE L om Scenario^ PreBFF x AE BFF \ PhaseATWTi FTS De i ay (6) 

D. Example Calculation 

The following example demonstrates the calculation 
process described above. All of the data used in this example is 
fictitious due to the sensitivity of the actual data. First, the 
LOM Scenario probability is needed. For this example a LOM 
Scenario mean probability of lxlO" 05 is used. For the selection 
of Abort Triggers, two primary triggers, PI and P2, and one 
secondary trigger, SI, are assumed. In lieu of estimating the 
Abort Triggers Probabilistic Split Fraction, FN, ATWT, and 
AE based on the physics of failure and propagation, and 
Abortability Table, the following is assumed in Table I. For 
sake of simplicity, only the Mode ATWT and AE values are 
listed. The calculation steps are the same for minimum and 
maximum parameters. 


Table I. Abort Analysis Sample Calculation Data 


Parameter 

Trigger PI 

Trigger P2 

Trigger SI 

Trigger Split Fraction 

80% 

20% 

100% 

False Negative 

5% 

5% 

1% 

Mode ATWT (ms) 

0 

-500 

-800 

Mode AE 

90% 

40% 

10% 


The sample data shown in Table I is not specifically related 
to SLS launch vehicle, but represents a characteristic tendency 
in the physics of launch vehicles, in which Primary Trigger 
detects the failure first and hence provide more warning time 
for crew to escape, and Secondary Triggers provide less time to 
escape, which would yield lower AE [9, 10]. 

First, starting off assessing the first primary trigger, PI, and 
calculating its TP LOM using (1): 

LOM TPxi = 80%xlxl0' 5 x(l-5%) = 7.60xl0' 6 (7) 

Now that Trigger Pi’s TP LOM is known, it can be used to 
calculate its LOC Benefit using (2) and result from (7): 

LOC Bxi = 90%x7.6xl0' 6 = 6.84X10' 6 (8) 

Next, repeat the above calculations using data for second 
Primary Trigger, P2, to obtain TP LOM and LOC Benefit 
values: 


FN x T otal = FN *i + FN *2 = 5.00X10- 7 . (11) 

For Secondary Triggers, the only probabilities that are 
detectable are the FN of the Primary Triggers; thus the need to 
calculate total Primary Trigger FN probability. Using (3), result 
from (11), and Trigger Si’s data, the TP LOM of a secondary 
trigger can be calculated: 

LOM TPyi = 100%x(1-1%)x5x10' 7 =4.95x10' 7 (12) 

Now that Trigger Si’s TP LOM is known, it can be used to 
calculate its LOC Benefit Distribution using (5) and result from 
( 12 ): 

LOC B = 10%x4.95xl0‘ 7 = 4.95xl0' 8 (13) 

With the Primary and Secondary Triggers calculations done, 
the final results for this LOM Scenario can now be estimated. 
First, sum all of primary and secondary triggers’ TP LOM: 

L °M TPTotal = 7.60x10' 6 +1.90x10' 6 +4.95x10' 7 

=9.995xl0' 6 (14) 

The result from (14) is shown beyond the 3 significant digits 
that are used in this section to denote that the value is not the 
same as the original LOM Scenario probability of lxlO' 5 due 
to rounding. Next the LOC Benefit for this LOM Scenario is 
summed from (8), (10) and (13). 

L0C B T otai = 6.84x10' 6 +7.60x10' 7 +4.95x10' 8 

=7.65xl0' 6 (15) 

Finally, the aggregated AE for this LOM Scenario is the 
simply the ratio of Total LOC Benefit over LOM Probability of 
the LOM Scenario: 

AE = L0CBene/it . (16) 

tplom v 7 

The AE for this LOM Scenario and Triggers suite is the 
calculated as: 

Mode AE = x ioo% = 76.5% (17) 

9.995xl0‘ 6 

The value in (17) indicates that the mode AE for this 
example Abort Trigger suite is 76.5% effective in enabling the 




crew to survive the failing launch vehicle in this Scenario. By 
repeating the above calculations with minimum and maximum 
ATWT and AE values (which were not performed here), one 
obtains a range of AEs that reflects the uncertainties in the 
ATWT and AE. The mean AE estimate would then be 
calculated using simple averaging if the ATWTs and AEs are 
either uniform or triangular distributed as stated section III. H. 
When the above calculations are performed and 
probabilistically summed for all LOM Scenarios for a launch 
vehicle, this provides the total LOC Benefit estimates and total 
AE estimates for the entire suite of Abort Triggers for the 
human-rated launch vehicle. Put another way, these numbers 
estimate the value (importance, not merely a numerical figure) 
of Abort Triggers in enabling the crew to survive. 

Conversely, one can estimate the remaining LOC as that 
amount of total LOM that are not mitigated by the Abort 
Triggers. These can be subdivided into portions of LOC due to 
the inability to detect the failures even in theory (the failure is 
not “covered” by detections), the amount due to state 
estimation problems (FN), and the amount due to the inability 
to escape in time when detected. The latter is amalgamation 
measure of the response effectiveness, which assesses the 
physics of the failure versus the performance of the launch 
abort system and detection time, as opposed to lack of 
detection coverage or failure to detect. 

E. Limitations & Caveats 

The biggest technical limitation and caveat to the sort of 
analysis described in this paper is the size of the uncertainties 
involved with these calculations. Each group performed their 
own calculations with their own uncertainties and assumptions, 
which are part of the overall Abort Trigger analysis process. 
Some of these have rather large uncertainties themselves, such 
as the probability of LOM calculations within S&MA PRA, the 
STE blast calculations, and the M&FM estimates of warning 
times. These uncertainties all have to be combined to yield 
something that has meaning when aggregated. In general, the 
group aimed for 5%, mode, mean, and 95% values. In practice, 
it is difficult to know how close the “best case” and “worst 
case” or “mode” values are to the ideal of 5%, mode, and 95% 
without performing full Monte Carlo simulations. At the 
“bottom” of the calculations are always engineering judgments 
being made by the relevant engineers and analysts, which must 
be vetted with other experts to ensure that they are reasonable. 

The analysis described here also requires significant 
resources. For a human-rated launch vehicle, with existing 
groups that perform related work that is already required for 
NASA systems, the resources were available. It was helpful 
that many of the groups already performed analyses that 
generated data similar to what was ultimately needed to 
perform the Abort Trigger effectiveness calculations and 
related LOC Benefit. Thus S&MA already performed PRAs 
that mainly needed to provide a bit more detail for some failure 
scenarios than they would otherwise have done. GN&C 
analyses of Abort Triggers did not require any new tools 
beyond those used for nominal analyses, though it did require 
using these tools in somewhat different ways than before. STE 
already performed blast overpressure, debris and fireball 
analyses, and mainly needed to structure those analyses in a 


way that enabled inputs of warning times from the M&FM 
group to provide appropriate outputs of crew survivability in 
differing conditions. The M&FM group, which is historically 
new within NASA MSFC, required the most “new” work, for 
the simple reason that this new approach of quantitative 
analysis of Abort Triggers had never been done before. This 
entailed development of the methodology and the Abort 
Analysis Matrix tool to mechanize the process, and also the 
establishment of new relationships to the other groups to enable 
this analysis to be performed. For other systems such as robotic 
spacecraft, these resources may not be available, and so the 
detailed process here would need to be simplified to enable it to 
occur. In the long run, SHM/FM analyses of the sort described 
here should become a standard feature of an SHM/FM tool 
suite, making it cost effective for most projects. 

Lastly, the analyses we have performed to date uncovered 
or made clear a number of issues that will need to be resolved 
in the future. One is the differentiation of crew survivability 
due to an effective abort, versus crew survivability simply due 
to an explosion being small or with debris by luck not hitting 
the MPCV. This is related to the over-simplified assumption 
that LOC occurs if a secondary Abort Trigger fails to detect an 
abort condition. The current method of attributing LOC Benefit 
to the first Abort Trigger that detects an abort condition in a 
LOM scenario needs further refinement. In other words, the 
more accurate method of calculating LOC Benefit is the “Delta 
LOC” methodology described previously, but this potentially 
more accurate method does not seem to provide a single 
absolute number for LOC Benefit, since it depends on the order 
of selection of previous Abort Triggers. Finally, issues such as 
the rogue booster described in section IV.C has now been 
directly integrated into the analysis through improved 
abortability tables, as opposed to being addressed with a post- 
processing macro as was performed for SLS PDR [10]. 

V. Conclusion 

The theory of System Health Management and of its 
operational subset Fault Management indicates that quantitative 
analysis of FM Control Loops can be performed, with metrics 
related to state estimation and control. This theory has been 
successfully applied to the selection of the baseline set Abort 
Triggers for the NASA SLS vehicle as a tool to improve the 
crew’s ability to escape from potentially catastrophic hazards 
while maximizing mission success. The selection of baseline 
Abort Triggers has been completed, in which the quantitative 
assessment of the Loss of Crew Benefit of an Abort Trigger 
suite played a significant role in the decision process. The 
process is now a standard part of the overall design process in 
SLS, and will likely be applied to similar programs in the 
future at NASA MSFC. Based on the successful experience of 
applying SHM/FM theory to this complex application, future 
improvements are envisioned to the methodology, to improve 
its technical accuracy and to reduce its future costs through the 
development of improved tools to perform these analyses. 

Acronym List 

AE Abort Effectiveness 

AAM Abort Analysis Matrix 

ARC Ames Research Center 

ATWT Abortability Table Warning Time 



BFF 

Booster Free Flyer 

CCSE 

Combined Control System Electronics 

CS 

Core Stage 

C&W 

Caution and Warning 

FC 

Flight Computer 

FDIR 

Failure Detection, Isolation, and Response 

FM 

Fault Management 

FMCL 

Fault Management Control Loops 

FMEA 

Failure Mode and Effects Analysis 

FN 

False Negative 

FP 

False Positive 

FTS 

Flight Termination System 

GFT 

Goal-Function Tree 

GN&C 

Guidance, Navigation and Control 

GRC 

Glenn Research Center 

LAS 

Launch Abort System 

M&FM 

Mission and Fault Management 

MPCV 

Multi-Purpose Crew Vehicle 

MSFC 

Marshall Space Flight Center 

NASA 

National Aeronautics and Space Administration 

LOC 

Loss of Crew 

LOM 

Loss of Mission 

LOME 

Loss of Mission Environment 

PDR 

Preliminary Design Review 

PRA 

Probabilistic Risk Assessment 

RM 

Redundancy Management 

S&MA 

Safety and Mission Assurance 

SDQ 

Sensor Data Qualification 

SHM 

System Health Management 

SLS 

Space Launch System 

STE 

Structures and Environments 

TP 

True Positive 

TN 

True Negative 
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